154,960 research outputs found
Bitplanes Block Based Exact Image Compression
Abstract: In this paper, an exact image compression based on bit-planes blocking is proposed. The proposed algorithm uses two bit codes for block representation. The codes represent the states of Unicode block and non-Unicode. The algorithm considers further division to non-Unicode block. The block division continues until the smallest block size which are kept as residuals. The smallest block size in the study is two by two. The main process of encoding consumed three codes. Subsequent process uses the fourth code for further compression. The resultant file is subject to further exact compression. The compression technique considered in this study is Huffman. The compression-decompression implementation complexity is comparable with the well-known methods. Also, the compression ratio for the algorithm is comparable with well-known methods. The algorithm parallelization is straightforward and dependent on number of planes. Within a plane, the process hardware realization is simple and does on require special hardware
A simple encoder scheme for distributed residual video coding.
Rate-Distortion (RD) performance of Distributed Video Coding (DVC) is considerably less than that of conventional predictive video coding. In order to reduce the performance gap, many methods and techniques have been proposed to improve the coding efficiency of DVC with increased system complexity, especially techniques employed at the encoder such as encoder mode decisions, optimal quantization, hash methods etc., no doubt increase the complexity of the encoder. However, low complexity encoder is a widely desired feature of DVC. In order to improve the coding efficiency while maintaining low complexity encoder, this paper focuses on Distributed Residual Video Coding (DRVC) architecture and proposes a simple encoder scheme. The main contributions of this paper are as follows: 1) propose a bit plane block based method combined with bit plane re-arrangement to improve the dependency between source and Side Information (SI), and meanwhile, to reduce the amount of data to be channel encoded 2) present a simple iterative dead-zone quantizer with 3 levels in order to adjust quantization from coarse to fine. The simulation results show that the proposed scheme outperforms DISCOVER scheme for low to medium motion video sequences in terms of RD performance, and maintains a low complexity encoder at the same time
Extended Bit-Plane Compression for Convolutional Neural Network Accelerators
After the tremendous success of convolutional neural networks in image
classification, object detection, speech recognition, etc., there is now rising
demand for deployment of these compute-intensive ML models on tightly power
constrained embedded and mobile systems at low cost as well as for pushing the
throughput in data centers. This has triggered a wave of research towards
specialized hardware accelerators. Their performance is often constrained by
I/O bandwidth and the energy consumption is dominated by I/O transfers to
off-chip memory. We introduce and evaluate a novel, hardware-friendly
compression scheme for the feature maps present within convolutional neural
networks. We show that an average compression ratio of 4.4x relative to
uncompressed data and a gain of 60% over existing method can be achieved for
ResNet-34 with a compression block requiring <300 bit of sequential cells and
minimal combinational logic
Recommended from our members
A content-aware quantisation mechanism for transform domain distributed video coding
The discrete cosine transform (DCT) is widely applied in modern codecs to remove spatial redundancies, with the resulting DCT coefficients being quantised to achieve compression as well as bit-rate control. In distributed video coding (DVC) architectures like DISCOVER, DCT coefficient quantisation is traditionally performed using predetermined quantisation matrices (QM), which means the compression is heavily dependent on the sequence being coded. This makes bit-rate control challenging, with the situation exacerbated in the coding of high resolution sequences due to QM scarcity and the non-uniform bit-rate gaps between them. This paper introduces a novel content-aware quantisation (CAQ) mechanism to overcome the limitations of existing quantisation methods in transform domain DVC. CAQ creates a frame-specific QM to reduce quantisation errors by analysing the distribution of DCT coefficients. In contrast to the predetermined QM that is applicable to only 4x4 block sizes, CAQ produces QM for larger block sizes to enhance compression at higher resolutions. This provides superior bit-rate control and better output quality by seeking to fully exploit the available bandwidth, which is especially beneficial in bandwidth constrained scenarios. In addition, CAQ generates superior perceptual results by innovatively applying different weightings to the DCT coefficients to reflect the human visual system. Experimental results corroborate that CAQ both quantitatively and qualitatively provides enhanced output quality in bandwidth limited scenarios, by consistently utilising over 90% of available bandwidth
A Novel Latin Square Image Cipher
In this paper, we introduce a symmetric-key Latin square image cipher (LSIC)
for grayscale and color images. Our contributions to the image encryption
community include 1) we develop new Latin square image encryption primitives
including Latin Square Whitening, Latin Square S-box and Latin Square P-box ;
2) we provide a new way of integrating probabilistic encryption in image
encryption by embedding random noise in the least significant image bit-plane;
and 3) we construct LSIC with these Latin square image encryption primitives
all on one keyed Latin square in a new loom-like substitution-permutation
network. Consequently, the proposed LSIC achieve many desired properties of a
secure cipher including a large key space, high key sensitivities, uniformly
distributed ciphertext, excellent confusion and diffusion properties,
semantically secure, and robustness against channel noise. Theoretical analysis
show that the LSIC has good resistance to many attack models including
brute-force attacks, ciphertext-only attacks, known-plaintext attacks and
chosen-plaintext attacks. Experimental analysis under extensive simulation
results using the complete USC-SIPI Miscellaneous image dataset demonstrate
that LSIC outperforms or reach state of the art suggested by many peer
algorithms. All these analysis and results demonstrate that the LSIC is very
suitable for digital image encryption. Finally, we open source the LSIC MATLAB
code under webpage https://sites.google.com/site/tuftsyuewu/source-code.Comment: 26 pages, 17 figures, and 7 table
A VLSI architecture of JPEG2000 encoder
Copyright @ 2004 IEEEThis paper proposes a VLSI architecture of JPEG2000 encoder, which functionally consists of two parts: discrete wavelet transform (DWT) and embedded block coding with optimized truncation (EBCOT). For DWT, a spatial combinative lifting algorithm (SCLA)-based scheme with both 5/3 reversible and 9/7 irreversible filters is adopted to reduce 50% and 42% multiplication computations, respectively, compared with the conventional lifting-based implementation (LBI). For EBCOT, a dynamic memory control (DMC) strategy of Tier-1 encoding is adopted to reduce 60% scale of the on-chip wavelet coefficient storage and a subband parallel-processing method is employed to speed up the EBCOT context formation (CF) process; an architecture of Tier-2 encoding is presented to reduce the scale of on-chip bitstream buffering from full-tile size down to three-code-block size and considerably eliminate the iterations of the rate-distortion (RD) truncation.This work was supported in part by the China National High Technologies Research Program (863) under Grant 2002AA1Z142
Energy Beamforming with One-Bit Feedback
Wireless energy transfer (WET) has attracted significant attention recently
for providing energy supplies wirelessly to electrical devices without the need
of wires or cables. Among different types of WET techniques, the radio
frequency (RF) signal enabled far-field WET is most practically appealing to
power energy constrained wireless networks in a broadcast manner. To overcome
the significant path loss over wireless channels, multi-antenna or
multiple-input multiple-output (MIMO) techniques have been proposed to enhance
the transmission efficiency and distance for RF-based WET. However, in order to
reap the large energy beamforming gain in MIMO WET, acquiring the channel state
information (CSI) at the energy transmitter (ET) is an essential task. This
task is particularly challenging for WET systems, since existing channel
training and feedback methods used for communication receivers may not be
implementable at the energy receiver (ER) due to its hardware limitation. To
tackle this problem, in this paper we consider a multiuser MIMO system for WET,
where a multiple-antenna ET broadcasts wireless energy to a group of
multiple-antenna ERs concurrently via transmit energy beamforming. By taking
into account the practical energy harvesting circuits at the ER, we propose a
new channel learning method that requires only one feedback bit from each ER to
the ET per feedback interval. The feedback bit indicates the increase or
decrease of the harvested energy by each ER between the present and previous
intervals, which can be measured without changing the existing hardware at the
ER. Based on such feedback information, the ET adjusts transmit beamforming in
different training intervals and at the same time obtains improved estimates of
the MIMO channels to ERs by applying a new approach termed analytic center
cutting plane method (ACCPM).Comment: This is the longer version of a paper to appear in IEEE Transactions
on Signal Processin
Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization
With the bottom-line goal of increasing the
throughput of a GPU-accelerated JPEG 2000 encoder, this paper
evaluates whether the post-compression rate control and
packetization routines should be carried out on the CPU or on
the GPU. Three co-processing models that differ in how the
workload is split among the CPU and GPU are introduced. Both
routines are discussed and algorithms for executing them in
parallel are presented. Experimental results for compressing a
detail-rich UHD sequence to 4 bits/sample indicate speed-ups of
200x for the rate control and 100x for the packetization
compared to the single-threaded implementation in the
commercial Kakadu library. These two routines executed on the
CPU take 4x as long as all remaining coding steps on the GPU
and therefore present a bottleneck. Even if the CPU bottleneck
could be avoided with multi-threading, it is still beneficial to
execute all coding steps on the GPU as this minimizes the
required device-to-host transfer and thereby speeds up the
critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to
22.4 fps for 0.16 bits/sample
- …